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Executive Summary 


In 2010, the Massachusetts Legislature lifted the cap on the 
number of charter schools that the Board of Elementary and 
Secondary Education (BESE) can authorize in low-perform- 
ing school districts. The “smart cap” requires the Department 
of Elementary and Secondary Education (DESE) to rank dis- 
trict performance (based on student outcomes) annually. Dis- 
tricts that fall into the lowest 10 percent of those rankings are 
eligible for an increase in the amount of tuition that they can 
pay to charter schools. Whereas the law limits district tuition 
payments to charter schools at 9 percent of net school spend- 
ing, the smart cap raises that limit to 18 percent of net school 
spending in the Commonwealth’s lowest performing districts. 
Until 2015, the Commonwealth ranked district perfor- 
mance solely according to the number of students scoring pro- 
ficient in core subject areas on the Massachusetts Comprehen- 
sive Assessment System (MCAS) 


proficiency and longer waiting lists were more likely to become 
eligible for an increase in the NSS cap. This has changed since 
SGP was incorporated into the formula for determining the 
lowest performing districts. 

The statistical error associated with SGP has a different 
impact on small (lower enrollment) and large (higher enroll- 
ment) districts. When a district enrolls higher numbers of stu- 
dents, its SGP is more likely to cluster around what researchers 
call a “typical mean,” even if the district has individual schools 
with very low or very high SGPs. When a district enrolls fewer 
students, its SGP is more likely to lie outside of the typical 
mean. Smaller districts don’t cluster around the typical mean 
in part because low enrollment makes their overall SGP rating 
more volatile. 

Because low-enrollment districts have more volatile SGP 


scores, they tend to move in and out of the bottom ten percent at 


In 2015, BESE amended tests. In 2015, BESE amended the higher rates, sometimes displacing larger 
the smart ca p smart cap regulations and began districts with SGP scores that cluster Researchers disagree 
. using an additional measure, the around a mean, even when those larger iednmacciac 
regulations and began student growth percentile (SGP), as districts have low overall student profi- a . ; Y 
using an additional 25 percent of the formula for deter- ciency. This means that some large dis- of the information 
measure, the student mining district rankings. tricts where demand for charter schools that SGP scores 
growth percentile (SGP), SGP tracks students’ progress by is high become ineligible for an increase provide. Like any 
comparing changes in an individ- in the NSS cap, even when large numbers measure of student 
as 25 percent of the ual’s MCAS score to those of stu- of students are still not proficient in core 


achievement, SGP is 


formula for determining 
district rankings. 


dents who scored similarly in prior 
years. The Commonwealth current- 
ly uses SGP for two main purposes: 
1) as a factor in its overall system for 
holding districts accountable for student outcomes; and 2) to 
determine which school districts are eligible for an increase in 
the net school spending (NSS) cap on charter schools. 

When used as one measure in a more holistic approach to 
assessing school outcomes, SGP can be informative. In dis- 
tricts that struggle to bring a majority of students to proficien- 
cy, it can provide information about whether teachers, schools, 
and districts are moving some students along the ladder of 
proficiency. However, researchers disagree as to the accuracy 
of the information that SGP scores provide. Like any measure 
of student achievement, SGP is subject to a degree of statistical 
error. Compared to measuring proficiency alone, SGP suffers 
from a large degree of error. This is leading a growing number 
of researchers to suggest that SGP may not be an appropriate 
metric for high-stakes policy decisions. 

‘The stakes associated with using SGP as one part of the for- 
mula for determining districts that are eligible for an increase 
in the charter cap are high: many of the communities in which 
overall student proficiency is low also have long waiting lists 
for charter public schools. Until 2015, when proficiency alone 
was the measure that the Commonwealth used to determine 
the lowest performing districts, districts with low overall 


subjects. Conversely, the smaller districts 
that enter the bottom 10 percent due to 
volatile SGP scores are more likely to _ of statistical error. 
be districts where demand for charter 

schools is low. They may also have higher 

overall proficiency rates than the larger districts they displaced 
in the bottom 10 percent. 

Using SGP as a factor in determining district eligibility for 
an increase in the charter school cap also has consequences for 
existing and prospective charter school operators. Including 
SGP in the formula has led to more movement in and out of 
the bottom 10 percent from year-to-year. Difficulty in pre- 
dicting which districts will fall “in” or “out” of the bottom 10 
percent annually means that existing operators have difficulty 
predicting enrollment and school budgets. For their part, 
prospective operators may feel discouraged from applying for 
charter schools in high demand areas that are on the cusp of 
the bottom 10 percent because it is unclear if the state will be 
able to authorize more charter schools from year to year. 

The following paper presents data on the relationship 
between enrollment and SGP and its impact on determin- 
ing the lowest performing districts in Massachusetts. Based 
on these data, the authors recommend that the Common- 
wealth stop using SGP as a factor in determining eligibility 
for an increase in the charter school cap and revert to a for- 


mula that uses absolute proficiency as the sole measure. The 


subject to a degree 
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Commonwealth should also separate determinations about 
increases in the net school spending cap from its overall 
accountability system. Other recommendations include miti- 
gating the negative impacts of the current formula for existing 


designed a second charter school cap. This cap limits the 
amount of money (charter school “tuition”) that districts can 
send to charters when students choose to attend them.’ 
‘This cap is also called “NSS”. 


and prospective charter operators and increasing transparency In Massachusetts, when a student leaves a district for a 


around the purpose and use of SGP more broadly. charter school, the per-pupil allocation that she would have 
received in the district—from both the state and local sourc- 


es—follows her to the charter. 


The Smart Cap: An Overview 


The Massachusetts Legislature established charter schools 
in 1993 as part of the Massachusetts Education Reform Act 


The law limits the amount dis- 


Opposition to charter 
schools had existed from 
the start, but opponents 
moved to halt charter 
expansion at the same 
time it was becoming 
clear that the schools were 
popular with parents. 


tricts can send to charters to 9 
percent of net school spending.® 
In 2019, there is still room 
to establish charter schools 
under the statewide cap of 120 
schools, but some large urban 


(MERA). Commonwealth charter schools have more auton- 
omy in exchange for more accountability. With freedom from 
some of the bureaucratic constraints that can hinder district 
schools, the legislature hoped charters would innovate and 
provide new public-school options for students and families.' centers and “gateway” cities like 
To date, many Massachusetts charter schools have deliv- Boston, Speaohdld, Lawrents, 
and Everett have reached the 
NSS cap. In these and other 


cities, there are tens of thousands of students waiting for 


ered on the promise of innovation. The majority have also 
proved to be very high-quality academic options, especially 
for poor and minority students 


To date, many and (increasingly) English lan- 


new charter seats to become available. The statewide total of 


Massachusetts charter Sa raiee and is as individual students on charter school waitlists was more than 
: isabilities.? Since , severa 
schools have delivered 25,000 as of March 2018.” 
: gold-standard studies have shown 
on the promise of that Boston’s charter schools, in 


innovation. The majority particular, outperform their district 
have also proved to 


be very high-quality 


Massachusetts Charter Schools 
counterparts in terms of standard- 


Authorizer: Board of Elementary and Secondary Education 


ized tests, graduation rates, number : 
(takes recommendations from Department of Elementary 


of Advanced Placement exams taken 


ic opti d Secondary Educati 
academic options, aid passed, ind dollege attendance: and Secondary Education) 
especially for poor and Another study, by Stanford Uni- Operating Unique students 
minority students versity’s Center for Research on Commonwealth on waitlists as of 
Educational Outcomes, described Charter Schools: May 2019: 
74 25,308 


Boston charters as some of the 


highest performing public schools in the nation. It found that Demographics 


Charters State 


compared to students in surrounding district schools, Boston 
charter schools “added an additional 12 months of learning in 


English Language Learner 14.1% 10.5% 

reading and 13 months of learning in math each school year.” 8 ue i : 
But in the beginning, the legislature could not predict Special Education 15.5% 18.1% 
whether charter schools would be successful. This is one reason Economically Disadvantaged 41.5% 31.2% 


why MERA capped the number of charter schools the state 
SOURCES: http://www.doe.mass.edu/charter/enrollment/fy2019/updat- 


could authorize at 25.° Demand for these newoptions, especially ed-waitlist.html; http://www.doe.mass.edu/charter/about.html 


in urban centers, outstripped supply within a few short years.° 
In response, the legislature modestly raised the statewide cap 


two more times over the next decade, but it did so in the midst ‘The NSS. cap: Has -chatwed Siehithy vee time. Ta 2010, 


of increasing antipathy toward charters, especially from the responding to incentives from the federal Race to the Top ini- 


state’s tw rful teachers unions. ee : bo fe : 
Pale Oe BOWERY scec ene ay tiative, the legislature instituted a “smart cap,” under which 


Opposition to charter schools had existed from the school districts whose performance on the Massachusetts 


start, but opponents moved to halt charter expansion at Comprehensive Assessment System places them in the bottom 


he same time it was becoming clear the sch . ’ } . 
ae Eee Berens ie eeeewe 10 percent statewide are subject to an increase in the amount 


popular with parents. To quell fears that charter schools of charter school tuition they can pay. Now, when districts 


“ Lee: | *. . . . 
would “drain” district enrollments, in 1997 the legislature perform in the bottom 10 percent, their charter tuition cap 
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increases from 9 to 18 percent of net school spending. The 
law states that any new charter school seats awarded under 
the smart cap must go to “proven providers’—operators with 


a proven track record of helping students achieve strong aca- 


demic outcomes.” 


‘The smart cap solved two problems in 2010: it opened 


up new seats in high-demand urban centers, and increased 


Using growth as even 

a small factor when 
determining which districts 
are eligible for an increase 
in the charter school cap 
runs counter to the spirit of 
the 2010 law: that students 
in districts where overall 
proficiency is low should 
have access to additional 
charter school options. 


the probability that students 
entering new charter schools 
would receive a high-quality 
education. But since that time, 
it has become clear that in some 
districts, the smart cap was only 
enough to make a small dent in 
charter school waiting lists.” 
Moreover, because of the proven 
provider clause, charter schools 
in low-performing districts have 
been replications of existing pro- 
grams, with few new (or innova- 


tive) options entering the market. 


‘These two consequences of 
the cap are concerning.” Also 
concerning, however, are state regulations that dictate how 
the DESE determines which districts fall into the bottom 
10 percent. 

For the first three years of the smart cap, the state deter- 
mined the bottom 10 percent based on MCAS proficiency 
scores alone: It ranked districts according to the percentage 
of students meeting and not meeting proficiency thresholds 
on MCAS math, English language arts, and science tests. 
But in 2014 the Board of Elementary and Secondary Edu- 
cation (BESE — the Commonwealth’s sole charter authoriz- 
er) determined that student growth scores on standardized 
assessments should be factored into district rankings. This was 
part of a broader move to incorporate growth scores into the 
state’s entire accountability system—a system that is distinct 
from the ranking process the department had been using to 
determine which districts are eligible for an increase in the 
charter cap. 

Since that time, the state has used a district’s median “stu- 
dent growth percentile”, in conjunction with the percentage 
of students who meet proficiency thresholds on each test, to 
determine which districts fall into the bottom 10 percent of 
performance. The current formula weights SGP at 25 percent 
and proficiency at 75 percent." 

BESE’s desire to consider growth as a factor in determin- 
ing the lowest performing districts was well-intentioned. ‘The 
board believed that districts should be rewarded when they 
help otherwise low-performing groups of students make prog- 
ress, even when overall proficiency is low. Former BESE Vice 
Chair Harneen Chernow noted at the time: 


“MCAS [test score] data almost always corresponds (neg- 
atively) with the socio-economic status of the district... 
but we are looking at improvement and innovation and 
change and where districts are doing good things... our 


goal should be to support and reinforce those outcomes.” 


‘The idea that growth should be encouraged might be import- 
ant in the context of a broader system of district accountability. 
However, using growth as even a small factor when determin- 
ing which districts are eligible for an increase in the charter 
school cap runs counter to the spirit of the 2010 law: that stu- 
dents in districts where overall proficiency is low should have 
access to additional charter school options. 

Data show that some districts exit the bottom 10 percent 
of performance because their growth scores seem strong, even 
when overall student proficiency remains low. In such cases, 
the ranking is a false boon for districts (which still need ample 
support to improve) and a real loss for parents and students 
seeking access to charter school opportunities that will not be 
available. 

When BESE made the decision to merge the formula for 
determining the lowest 10 percent of districts with the state’s 
overall system of district accountability, it might not have fully 
understood the various impacts of using growth measurements 
for high-stakes policy decisions. When some BESE members 
questioned why SGP shouldn’t receive more weight in the 
formula for determining district performance, then-Com- 
missioner of Elementary and Secondary Education Mitchell 
Chester warned of factoring student growth too heavily: “I 
am not recommending a larger increase in the weighting of 
growth, because it would start to distort the identification of 
schools and districts most in need of our assistance.” 

In this statement, Chester captured the concerns of many 
charter advocates. By rewarding the lowest performing dis- 
tricts for high growth scores, charter 


advocates feared two things: 1) Many —_ Data show that some 
districts would still be able to fail large districts exit the 
swaths of students who were not profi- 

fs bottom 10 percent of 
cient simply because they moved some 
students from the very “bottom” of pro- performance because 
ficiency categories to the middle or top their growth scores 


of the bottom; and 2) Including growth 
as a factor would likely allow some 
districts where demand for charters is 
highest to exit the bottom 10 percent. 
Districts like Boston, for example, have 
had consistently low MCAS scores over 
time (especially for low-income and 
minority students, students with dis- 
abilities, and English language learn- 


seem strong, even 
when overall student 
proficiency remains 
low. In such cases, 
the ranking is a false 
boon for districts. 


ers), but have come very close to exiting the bottom 10 percent 


because their SGP scores have impacted overall rankings. 
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On both counts, charter advocates have been right. The 
City of Lynn, Massachusetts provides one example. Between 
2015 and 2018 Lynn has held various rankings within the 
bottom 10 percent, climbing slightly in some years due to a 
steadily increasing median SGP.’ In the 2018-19 school year, 
Lynn exited the bottom 10 percent mainly due to a sizeable 
increase in SGP, although it is still the 24th lowest performing 
district in the Commonwealth in terms of absolute proficiency 
on MCAS.” 

In 2018, only 33 percent of Lynn’s fourth graders and 
31 percent of eighth graders reached proficiency on MCAS 
English language arts assessments. In math, only 33 percent 
of fourth graders and 23 percent of eighth graders reached 
proficiency. For English language learners and students with 
disabilities, the data are even more discouraging: on grade 4 
math assessments, only 23 percent of English language learn- 
ers and 7 percent of students with disabilities scored proficient 
on MCAS. These numbers indicate that Lynn is not even close 
to helping a majority of students achieve basic proficiency." 

As of 2018 there are 1,920 unique students on charter 
school waiting lists in Lynn; '? 12 percent of the district’s stu- 
dent population is seeking access to charter schools that are, 
on average, higher performing than the district. However, 
access to additional charter school seats will not be available in 
the coming year, because the district’s NSS cap will be reduced 
from 18 to 9 percent. Parents in Lynn who desire access to char- 
ter public schools, will have to wait for the district to fail even 


more students before the spending cap will increase again. 


Percentage of Students Not Meeting Expectations, 
Grade 8 MCAS English Language Arts and Math, 
Lynn School District, KIPP Charter Public School, 
Lynn, MA, State 


Lynn (District) KIPP Lynn (Charter) State 
English 20 11 15 
Math 18 7 12 


SOURCE: http://profiles.doe.mass.edu 


And while reduced access to high-performing schools is 
the most important problem the Lynn situation illustrates, the 
volatility of the net school spending cap creates another issue, 
too. In a given year a worthy charter operator could be poised 
to open a new school under and 18 percent NSS cap but ulti- 
mately be unable to do so because the district exits the bottom 
10 percent before the school can open. Acting Commissioner 
Jeff Wulfson outlined this problem in a February 2018 memo 
to the board. Discussing a proposal for a new charter school 


in Lynn he wrote: 


“\..the application for Equity Lab Charter School substantial- 
Ly met the criteria for approval. However, Iwas unable to rec- 


ommend this school for a charter because of upcoming changes 


in the NSS cap for Lynn.”*® 


A problem similar to what Wulfson described exists in other 
communities that are on the cusp of exiting the bottom 10 
percent, such as Everett. In 2015 
and 2016 Everett was ranked among 


In 2015 and 2016 
Everett was ranked 
among the lowest 


the lowest performing districts in 
the Commonwealth. As of 2017 
it exited the bottom 10 percent, in 
large part due to its SGP. This may 
seem like a win for Everett families, the Commonwealth. 
but it’s not. 

The Pioneer Charter School 
of Science (PCSS) operates three 
campuses in Revere, Everett, and 


Chelsea, Massachusetts. As of Jan- 
uary 2019, the organization reported 435 individual students 


bottom 10 percent, in 


on waitlists from Everett alone. *! Because Everett’s net school 
spending cap on charter schools was cut in half in 2018, PCSS’s 
Everett campus can only admit siblings (who have preference) 
of enrolled students from its waitlist, which continues to grow. 

PCSS’s waitlisted students aren’t the only people affect- 
ed by the uncertainty associated with Everett’s ranking as a 
low-performing district. Enrolled students suffer as well. With 
the reduction in Everett’s NSS cap, PCSS feared in 2018 that it 
could lose up to $200,000 due the forced decline in the number 
of students it can enroll. Such financial uncertainty is difficult 
for any school, because enrollment is the major budget driver, 
dictating everything from staffing to course and extra-curric- 
ular offerings. To its credit, DESE realized the severity of the 
financial blow PCSS might take and made adjustments, mainly 
to the sibling reimbursement formula, to protect the organiza- 
tion. But this scenario illustrates yet another negative impact of 
the regulations that dictate how the department determines the 
NSS cap. 

‘The lowering of the charter school cap in communities like 
Everett and Lynn gets to the heart of a very politicized charter 
school debate. Advocates of including SGP in the formula for 
determining low performance do not want to “punish” dis- 
tricts that have demonstrated progress with student popula- 
tions (mainly low-income) that they deem “harder to educate.” 
For their part, charter advocates see the inclusion of SGP as a 
mechanism to deny students access to charter public schools, 
even though the traditional district is failing them. 

But both these narratives are simplistic, and neither poses 
the most meaningful questions: What measure or measures 
will provide the most reliable information about which districts 
are eligible for an increase in the net school spending cap? What 
measure or measures will provide additional opportunities for 


performing districts in 
As of 2017 it exited the 


large part due to its SGP. 
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families, aligned with the spirit of the 2010 law? 

Absolute proficiency may not be the best indicator of 
whether a school or district is making progress educating 
students, but it does provide an indication of whether schools 


is that it does not help teachers, students, parents, or policy- 
makers understand how much progress a student has made 
over time. A student who enters school far behind her peers 
may progress rapidly with the help of an effective teacher but 


and districts are able to educate still be unable to pass a standardized 


students to a minimum standard. test. Conversely, a student who enters In its explanation of 


this “growth model,” 


For their part, charter 


advocates see the inclusion Student growth percentiles are, 


with grade-level skills could make com- 


on the other hand, a much more paratively little academic progress over 


of SGP as a mechanism complicated measure. Parents, the same amount of time but still reach DESE goes to great 

to deny students access policymakers, and even educators proficiency on an examination. In this lengths to assure 

to charter public schools, assume they are a reliable measure scenario, it is easy to see why measuring — Stakeholders that SGP 
even though the traditional of how much a teacher, school, or growth, instead of just proficiency, is jg just one part ofa 


desirable. 


Measuring growth is particularly 


district has “moved” a student up 


more holistic approach 
the ladder of proficiency. But this ore holistic approac 


to how it understands 


district is failing them. 


is not an accurate description of 
SGP. According to an increasing number of psychometricians, 
SGP is widely misunderstood and widely misused. 

In Massachusetts, reliance on SGP for making high-stakes 
policy decisions has made access to charter schools more 
unpredictable. SGP causes districts to enter and exit the bot- 
tom 10 percent with more frequency than they would if abso- 
lute proficiency remained the only measure of performance. 
‘This is because SGP measurements include a large degree of 
statistical error. 

This paper discusses the reliability and impact of SGP as 
a factor in determining whether Massachusetts school dis- 
tricts are subject to an increase in the charter school cap. To 
illustrate the point, the authors use publicly available data to 
demonstrate how factoring SGP into the formula for deter- 
mining the NSS cap moves districts with higher enrollment 
out of the bottom 10 percent. This doesn’t happen because 
these districts are showing dramatic achievement growth; 
rather, when enrollment is high, SGP scores are more likely 
to cluster around a “typical mean.” Conversely, districts with 
lower enrollments are more likely to have volatile SGP scores, 
which can move them into the bottom 10 percent even when 
absolute achievement is (comparatively) better than it is in 
high-enrollment districts. 


The Student Growth Percentile 
and Accountability 


In recent years, states have begun to use a student growth per- 
centile as one way of reporting student, school, and district 
performance on standardized tests. ** In the decade follow- 
ing implementation of the No Child Left Behind Act, which 
required states to employ high-stakes standardized assess- 
ments in return for federal funding, some stakeholder groups 
were concerned that reporting student test scores in terms of 
proficiency alone (whether a student meets a pre-defined stan- 
dard on an assessment) was misleading. 


A common argument against relying on proficiency alone 


compelling for schools and districts 
that serve concentrated populations of 
low-income students and/or those that 
see wide achievement gaps when stu- 
dents enter school. Because proficiency 
rates correlate with socio-economic 
status (low-income children are more SGP scores. 
likely to struggle to meet proficiency), 
measuring growth feels more just to some educators and pol- 
icymakers. Critics of measuring proficiency alone also point 
out that the pressure associated with helping students score 
proficient on tests incents educators to give disproportionate 
attention to “bubble kids,” those who can pass the test with 
enough help.’ 

‘This was, in part, what the DESE reasoned when it first 
began reporting growth in 2011. The Department’s MCAS 
Student Growth Percentiles Interpretive Guide from that year 


states: 


Measuring student performance relative to standards specific 
to each grade level is useful in determining whether a student 
has met the standards for that grade. There are, however, 
several obstacles to using this approach to measure students’ 
academic growth. This is why we have developed “student 
growth percentiles,” a measure of student progress that 
compares changes in a student’s MCAS scores to changes in 
MCAS scores of other students with similar scores in prior 
years. A student growth percentile measures student progress 
by comparing one student’s progress to the progress of other 
students with similar MCAS performance histories. We refer 


to students with similar score histories as “academic peers.” 


Along with other measures, DESE reports student-level 
growth scores on parent/guardian reports. ‘These individual 
student growth percentiles are also aggregated to the class- 
room, school, and district levels. Since 2011, DESE has used 
the median of all aggregated individual scores to report school 
and district-level SGP.* Under its new accountability system, 


student, school, and 
teacher performance.”’ 
However, there are 
high stakes attached to 
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piloted in 2018, aggregate SGP scores will be reported as the 


mean of individual scores.”° 


In its explanation of this “growth model,” DESE goes to 


great lengths to assure stakeholders that SGP is just one part 


of a more holistic approach to how it understands student, 


Massachusetts isn’t alone 

in its use of SGP as a factor 

in determining school 
performance. When it was first 
introduced around 2009, the 
federal government quickly 
embraced the idea, encouraging 
States to incorporate some 
measure of student growth in 
Race to the Top applications 
and plans for The Every Student 
Succeeds Act (ESSA).° 


school, and teacher perfor- 
mance.”” However, there 
are high stakes attached to 
SGP scores. Aggregated 
SGPs may be used at the 
school and district level 
for accountability purposes 
(including teacher evalua- 
tions), and as of 2018 the 
Commonwealth formally 
relies upon SGP as one part 
of a formula to rate districts 
and hold them account- 
able for performance.”** 
SGP scores can influence 
which districts and schools 


might be candidates for 
state intervention or “turn-around” and which districts and 
schools no longer need intervention and/or support. 

Massachusetts isn’t alone in its use of SGP as a factor in 
determining school performance. When it was first intro- 
duced around 2009, the federal government quickly embraced 
the idea, encouraging states to incorporate some measure of 
student growth in Race to the Top applications and plans for The 
Every Student Succeeds Act (ESSA).” As of 2018, a majority of 
states include some measure of student growth in their federal 
ESSA plans, though not all use SGP as the reporting tool.*° 

But in Massachusetts and nationwide, the rapid rise of 
SGP as a reporting tool and a mechanism for high-stakes policy 
decisions is troubling. A growing body of research suggests that 
practitioners and policy makers should better understand the 
utility, risks, and benefits of this sophisticated measure. 

Sireci, Wells & Keller of the Center for Educational 
Assessment at the University of Massachusetts at Amherst 
summarize several problems with SGP, ranging from the 
practical to the very technical. They argue that SGP is a wide- 
ly misunderstood reporting tool; parents don’t understand 
what a “growth percentile” is, and teachers don’t know how 
to use SGP to inform practice. They also contend that SGP is 
an unreliable measure, with “no validity evidence” to support 
its use. Although the authors limit their discussion of SGP to 
the student and classroom levels, their rationale also applies 
to SGP use at the district level. They forcefully argue that 
states should abandon SGP for both reporting and evaluation/ 
accountability purposes.*! 

Some researchers disagree with Sireci, Wells, & Keller. 


They acknowledge that SGP is not a perfect way to measure 
student growth but argue that the information it can provide 
is useful enough to warrant its continued use. Andrew Ho of 
Harvard (who also advises DESE on its use of SGP) respond- 
ed to Sireci, Wells, & Keller, writing: ...” the sufficiency of 
SGP reliability (or any score reliability) depends upon the 
intended interpretations and uses of SGPs.” In the view of 
Ho and others, if we are transparent about what SGP is and 
thoughtful with how we use it, SGP can complement other 
measures of student achievement. 

Assessments of SGP reliability derive from an increasingly 
large body of literature, which notes that SGP scores suffer 
from a wide margin of error, especially at the most granular 
(student and teacher) reporting levels. Several studies in the 
past few years report that a 95 percent confidence interval for 
SGP scores is roughly 50 points.” In lay terms, if a school’s 
reported SGP median score is 50, researchers can be 95 percent 
confident that the school’s actual SGP median score is some- 
where between 25 and 75. Five percent of the time, even this 
wide range will be incorrect. 

With SGPs reported from 1-99 percent for students, 
schools, and districts, 50 points is a very large margin of error. 
Put differently, if one school received an SGP of 30 and anoth- 
er a 70, researchers couldn't be confident that the school with 
the higher score actually helped students grow more than the 
school with the lower score. 

Researchers have been looking for ways to mitigate this 
bias, and there is evidence that aggregating SGP scores (for 
example, to the district level) 
makes them somewhat more 
reliable. Other 


suggests that using the mean, 


research 


derive from an increasingly 
large body of literature, 
which notes that SGP scores 


as opposed to the common- 


ly-used median, to measure 


Assessments of SGP reliability 


SGP could mitigate reliabil- 
ity issues. Castellano & Ho™ 


suffer from a wide margin 
of error, especially at the 


find that the mean has greater most granular (stu dent and 


teacher) reporting levels. 


sampling variability, making 
it “a more attractive aggre- 
gation function.” This is one 
reason why the Commonwealth will begin to calculate SGP 
using the mean as opposed to the median. 

But this move will only mitigate some of the error inherent 
in SGP. This measure, like any other, is not perfect. Given this, 
DESE should be more transparent about the limits of SGP as an 
evaluation and reporting mechanism. Both the department and 
the board should also be wary of overemphasizing SGP as one 
part of a larger system of accountability and of using SGP for 
high-stakes policy decisions. 

Using SGP as part of the formula that determines 
which districts are eligible for a charter cap increase is a very 
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high-stakes policy decision, and over the years the conse- 
quences of incorporating SGP into the formula to determine 
the lowest performing 10 percent of districts have become 
clear. Including SGP in the formula distorts growth in some 
school districts. 


The Impact of SGP on Calculating the 
Lowest Performing Districts in MA 


In any given year since 2011, a cursory scan of school districts 
that fall into the bottom 10 percent on MCAS performance 
reveals a handful of districts—many of them “gateway dis- 
tricts’—that have difficulty helping students reach proficiency 
in English, math, and science. Districts with very low absolute 
performance on MCAS almost always fall into the bottom of 
the bottom 10 percent. 


The “Smart” Cap 


When districts fall into the lowest 10 percent, as measured 
by proficiency on standardized tests of English, math, and 
science (75 percent) and SGP (25 percent), the amount of 
tuition the district can send to charter schools rises from 9 
percent to 18 percent of net school spending. Any new 
charter school seats in these districts must be awarded to 
“proven providers.” 


But in 2015, with the introduction of SGP as 25 percent of 
the formula for determining the lowest performing school 
districts, an interesting pattern began to emerge. Some of the 
largest Massachusetts school districts—ones with very low 
proficiency scores but middling to high growth scores—were 
steadily moving up in the overall rankings. Some moved out 
of the bottom 10 percent altogether. Curiously, districts that 
moved into the bottom, mainly due to very low SGP scores, 
tended to be very small, enrolling comparatively few students. 
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A high-level analysis of what happens when SGP is and is 
not included as a factor for determining the lowest performing 
districts suggests that there is a relationship between SGP 
calculations and the size of school districts. Put another way, 
school districts with comparatively low enrollment (in most 
cases below or far below 2,500) are more likely than their 
larger counterparts to have a median SGP that falls outside a 
“typical” range. 

Existing literature on SGP suggests a relationship between 
sample size and the precision of aggregate SGP estimates. 
Culbertson finds that SGP estimates are less precise for 
high- and low-achieving students than for those with aver- 
age achievement when the total sample size is small. He also 
notes that some steps states take to reduce this error, such as 
categorizing students into performance bands based on prior 
achievement before estimating SGP, may reduce error slightly 
but have other trade-offs, such as “reducing the similarity of 
students whose growth is compared.”** 

The authors tested the observation of a relationship 
between non-typical SGP and low enrollment by identifying 
high- and low-enrollment districts in the Commonwealth and 
plotting their SGP and proficiency 
scores (represented as the “scaled A high-level analysis of 
score” reported by DESE) to see 
ifa pattern emerged.** The authors 
identified Massachusetts districts 
where fewer than 1,000 students 
were included in the 2018 Next 
Generation MCAS as low enroll- 
ment and districts where more 
than 1,000 students were included 


in the exam as high enrollment. 


suggests that there is a 


school districts. 


Neither of these sample sizes is 
enough to sufficiently reduce the 
wide margin of error inherent in SGP, but the analysis below 
shows that districts with higher enrollment are more likely to 
cluster around what DESE refers to as a “typical” median of 


40-60 SGP. *” 


what happens when SGP 
is and is not included as a 
factor for determining the 
lowest performing districts 


relationship between SGP 
calculations and the size of 
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Graph 1: 2018 MCAS ELA, SGP and Scaled Scores for High Enrollment Districts” 
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Graph 3: 2018 MCAS Math, SGP and Scaled Scores for High Enrollment Districts® 
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Graph 4: 2018 MCAS Math, SGP and Scaled Scores for Low Enrollment Districts” 


520 
515 
510 
505 


500 


Ss 


495 


490 


485 


480 


475 


SGP 


“Publicly available data provided by Massachusetts Department of Elementary and Secondary Education: authors’ analysis 


12 


AN ANALYSIS OF HOW MASSACHUSETTS’ “STUDENT GROWTH” MODEL LIMITS ACCESS TO CHARTER PUBLIC SCHOOLS 


These visuals suggest a correlation between a district’s 
enrollment and typical (between 40 and 60) or non-typical 
(less than 40 or greater than 60) SGP. It indicates that districts 
with higher enrollment may receive a higher (or “better”) SGP 
score because higher enrollment means they are more likely 
to have an aggregate SGP score close to the typical median 
of 50 (even though this is not a “true” SGP). Districts with 
lower enrollment are more volatile, and therefore more likely 
to show either very high or very low growth, and less likely to 
cluster around the typical median. 

‘These visuals suggest that the already large amount of ran- 
dom error inherent in SGP calculations is exacerbated when 
sample size is insufficient. Because any relationship between 
a measure of district academic performance and something 


unrelated, such as sample size, is spurious, this correlation 


suggests that DESE and outside researchers should undertake 
further study.*8 


Chart 2 
Lowest Performing Districts by Achievement Only (2018) 
Overall Rank District Overall Achievement Overall Growth Rank 
(1-289) Rank out of 289 out of 289 
2 Holyoke 1 9 
2 Southbridge 2 6 
1 Webster 3 1 
7 Chelsea 4 26 
5 Springfield 5 iW 
14 Brockton 6 73 
4 New Bedford 7 3 
6 Gardner 8 5 
26 Lawrence 9 132 
12 Athol-Royalston 10 4l 
27 Orange 11 127 
10 Fitchburg 12 30 
13 Boston 13 41 
8 North Adams 14 4 
37 Worcester 15 136 
19 Wareham 16 91 
28 Adams-Cheshire 17 111 
9 Winchendon 18 2 
57 Randolph 19 201 
11 Taunton 20 8 
34 Fall River 21 110 
43 Everett 22 139 
40 Lowell 23 116 
51 Lynn 24 159 
21 North Brookfield 25 71 
31 Gill-Montague 26 88 
15 Ware 27 24 
18 Haverhill 28 49 
49 Salem 29 138 
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This relationship also begs two very important policy 
questions: Are larger districts that have low proficiency but 
high growth exiting the bottom 10 percent because their stu- 
dent outcomes are actually improving? Or, are smaller districts 
entering the bottom 10 percent, essentially displacing those larg- 
er districts that tend to cluster around a typical median SGP, 
because their SGP scores are even more unreliable than the SGP 
scores of the larger districts? Either way, a high-stakes policy 
that rewards and punishes districts based in part on SGP, is 
advancing a disingenuous proposition that any of these dis- 
tricts are performing either “better” or “worse” than others. 

Using the most recent DESE data (2018), Chart 2 below 
shows the difference between districts that fall into the bottom 
10 percent based on proficiency alone and districts that fall 
into the bottom 10 percent when SGP comprises 25 percent of 
the formula. It reveals that many low-enrollment districts that 
enter the bottom 10 percent based on low growth scores rank 
slightly higher when growth is not a factor. 


Lowest Performing Districts by Achievement and SGP (2018) 


Overall Rank District Overall Achievement Overall Growth Rank 
(2:32) Rank out of 289 out of 289 
1 Webster 3 1 

2 Holyoke 1 9 

2 Southbridge 2 6 

4 New Bedford 7 3 

5 Springfield 5 1 
6 Gardner 8 5 

7 Chelsea 4 26 
8 North Adams 14 

9 Winchendon 18 

10 Fitchburg 12 30 
1 Taunton 20 8 
12 Athol-Royalston 10 4l 
13 Boston 13 41 
14 Brockton 6 73 
15 Ware 27 24 
16 Greenfield 31 15 
17 Pittsfield 30 25 
18 Haverhill 28 49 
19 Wareham 16 91 
20 Easthampton 43 13 
21 North Brookfield 25 71 
22 Marlborough 34 46 
23 Rockland 38 39 
24 Leicester 45 19 
25 Palmer 41 35 
26 Lawrence 9 132 
27 Orange 11 127 
28 Adams-Cheshire 17 111 
29 Hawlemont 51 11 


Of the districts that 
exit the bottom 10 
percent when SGP is a 
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Another way to view the data is to ask which districts exit 
the bottom 10 percent because of higher growth scores and 
which take their place due to lower growth scores. Chart 3 
below shows districts that exit the 
bottom 10 percent when growth is a 
factor and the districts that enter to the 
bottom 10 percent to take their place. 
The chart also shows total enrollment 


factor, half have total for each district as well as the number 
student enrollment of of students that, according to the state, 
more than 5,000 are on charter school waitlists in each 


district. Importantly, the total enroll- 
ment provided here is a larger number 
than that which the state would use to calculate SGP, because 
not all students in the districts will take standardized examina- 
tions that count toward the aggregate SGP score. 

Chart 3 below gives a clearer picture of the influence 
enrollment likely has on SGP. Of the districts that exit the 
bottom 10 percent when SGP is a factor, half have total 
student enrollment of more than 5,000 and all but one have 
enrollment greater than 2500 students. In fact, four of the dis- 
tricts listed here (Lynn, Lowell, Worcester, and Fall River) are 
among the largest in the state. Of those that enter the bottom 
10 percent, only one—Pittsfield—has total enrollment greater 
than 5,000 (and it is very likely that the number of students 
included in the state’s SGP calculation for Pittsfield is not close 
to 5,000). Perhaps more telling, 75 percent of the districts that 
enter the bottom 10 percent when growth is a factor have total 
student enrollments below 2,500. At least five of these districts 
(Hawlemont, Easthampton, Greenfield, Palmer, and Leicester) 


are among the smallest in the state. 


Another indicator is the difference in the range of growth 
scores between the larger districts that exit the bottom 10 per- 
cent and smaller districts that enter the bottom 10 percent. 
As the literature suggests, larger districts that exit the bottom 
10 percent cluster around the “typical” median, with a range 
of SGP percentile scores between 46.1 and 52 (6 points). The 
range among districts that enter the bottom 10 percent is con- 
siderably wider, between 36.2 and 56.4 (20 points). 

And acloser look at SGP within districts reveals more wide 
differences. In Worcester, the largest district in this analysis, 
school-level aggregate SGP percentiles 
range from 22.3 on the low end to 72.7 
on the high end. In Easthampton, the 
smallest district for which enough data 


In fact, four of 
the districts listed 
are available to show a range of aggre- 
gate SGP percentiles between schools, 
the percentiles range from 40.8 on the 
low end to 63.5 on the high end. This 
suggests that, despite having a median 
SGP within the typical range, a large 
district like Worcester is home to many 
schools that theoretically have much lower SGP scores (we 
don’t know if they are “true” SGP) but will nonetheless exit 
the bottom 10 percent because the entire district skews toward 
the median. 

This analysis does not suggest that districts that enter 
the bottom 10 percent are high performing; all the districts 
presented in this analysis have comparatively low overall 
proficiency rates. Instead, it shows that districts with high- 
er enrollment are more likely to have a typical SGP score, 
regardless of whether they are helping students achieve or 


Chart 3: Districts that Exit and Enter The Bottom 10% with SGP In Formula (2017-18 MCAS Data) 


Exit Bottom 10% With SGP in Formula 


3,694 52 


Salem 

Lynn 15,517 47.2 1,464 
Lowell 14, 436 46.1 481 
Everett 7,068 48.4 868 
Randolph 2,823 51.1 427 
Worcester 25,306 46.2 874 
Fall River 10,128 46.6 594 
Gill-Montague 976 49.9 0 


SGP % Range 46.1-52 
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Enter Bottom 10% with SGP in Formula 


Total Enrollment SGP % Charter Waitlist 

Hawlemont 163 36.2 0 

Rockland 2,193 56.4 56 
Marlborough 4,575 50.9 133 
Easthampton 1,541 46 95 

Pittsfield 5,464 41.8 0 

Greenfield 1,699 46.1 46 

Palmer 1,400 49.1 12 

Leicester 1,569 49.8 7 


here (Lynn, Lowell, 
Worcester, and Fall 
River) are among the 
largest in the state. 
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grow toward proficiency. Lower-enrollment districts, on 
the other hand, are less likely to show a typical SGP score 
and will therefore displace higher enrollment districts in the 
bottom 10 percent. If these data are correct, students are not 
well served by the state’s formula for determining which dis- 

tricts need additional support. 
By law, one form of intervention and/or support districts 
become eligible for when they enter the bottom 10 percent 
is additional charter school seats. 


Conclusions and Recommendations 


Massachusetts has some of the highest performing charter 
schools in the country, but district net school spending caps 
inhibit access and innovation. Under the law, districts must 
abjectly fail before the NSS cap increases in high-demand dis- 
tricts. This policy frames charter schools as an escape valve, 
which is unhealthy for the public-school community (charter 
public schools included). 

More importantly, regulations governing how the state 


Lower-enrollment Factoring demand for charters into determines the NSS cap have created a situation in which 


districts, on the other 
hand, are less likely to 
show a typical SGP score 
and will therefore displace 
higher enrollment districts 
in the bottom 10 percent. 


this analysis is important, because access to charter schools in some communities exists for only 
a number of the districts that exit 
the bottom 10 percent when SGP 


is a factor have comparatively high 


a short window of time. This deters prospective charter oper- 
ators, creates financial and other uncertainties for existing 
providers, and confuses and frustrates parents and students on 
demand (in the form of waitlists) charter school waiting lists. 
for charter schools. Conversely, ‘The analysis presented in this paper suggests that the cur- 


many of the districts that enter the rent formula for determining districts that are eligible for an 


bottom 10 percent when growth 
is a factor have few if any students 
on charter waitlists. This could indicate a total lack of desire 
for charter schools, or that there are few if any charter school 
options in these communities. 

‘The 2010 law didn’t provide communities with high charter 
demand more access to these public schools, but in the years 
immediately after the law was enacted, the same communities 
that became eligible for an increase in the NSS cap also saw 
high demand for charter schools. It was not until the board 
voted to incorporate SGP into the formula for determining 
the lowest performing districts that communities with limited 
demand for charter schools became eligible for an increase in 
the cap. The 2010 law does not require that BESE use the 
same metrics for the state’s overall accountability system and 
for determining the lowest performing districts that will 
become eligible for an increase in the NSS cap. 

Theoretically, BESE could employ two systems: one that 
conservatively factors SGP into a district’s accountability 
rating, and one that uses absolute proficiency alone (a more 
reliable and stable measure), to determine which districts are 
the lowest performing for purposes of the NSS cap. Using two 
systems would not only yield useful, comparative information 
for the state, it would address a serious equity issue by granting 
increased access to charter schools in the communities where 
parents most want them. 

Moreover, using absolute proficiency as the only measure 
for determining which communities are eligible for an increase 
in the charter school cap is better aligned with the spirit of the 
law than a policy that considers both growth and proficiency. 
The intent of the 2010 law was to provide students and families 
in districts where schools struggle to help students meet profi- 
ciency with different public school options. It is time to realize 


the legislature’s vision. 
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increase in the charter school cap is unreliable and volatile. 
‘The authors find a relationship between SGP and district 
enrollment, which suggests that statistical error influences 
district SGP scores. 

When SGP accounts for 25 percent of the formula for 
determining the lowest performing districts, those with com- 
paratively high enrollment 
are more likely to exit the 


BESE could employ 
two systems: one that 


bottom 10 percent when 
SGP is high but overall pro- 
ficiency is low. Conversely, 
districts with comparatively 
low enrollment are more 
likely to enter the bottom 10 
percent because their SGP 


and one that uses absolute 
proficiency alone (a more 


scores are less likely to fall ; : ee 
within a typical median. to determine which districts 
‘This can happen even when 
overall proficiency scores in purposes of the NSS cap. 
these districts is compara- 

tively higher. Because SGP 

is a less reliable measure than proficiency alone, districts that 
may be due access to more charter schools under the spirit of 
the law are being denied such access. 

‘The analysis has limitations. Notably, the authors do not 
have access to information about how the state’s formula for 
determining SGP may mitigate (or attempt to mitigate) the 
random error inherent in SGP calculations and/or the error 
that can accompany small sample size. Moreover, the con- 
clusions the authors draw about the relationship between 
SGP and enrollment derive from a very high-level analysis 
of the publicly available data, with limited information about 
the precise sample size included in DESE’s actual SGP 


conservatively factors SGP into 
a district's accountability rating, 


reliable and stable measure), 


are the lowest performing for 


Both of the 
Commonwealth's 
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calculations. The findings of this paper suggest that the state 
and other independent researchers should further investigate 
the relationship between aggregate SGP (whether median or 


mean) and enrollment. 


Recommendations 
Revert to a formula that uses proficiency in core subjects to 
determine which districts are subject to an increase in the 
charter school cap. 
‘The growing body of literature on the student growth per- 
centile suggests it not a reliable enough measure to use for 
high-stakes policy decisions. Proficiency on standardized tests 
should not be the only measure a teacher or school uses to 
understand what students can do or how they are progressing. 
However, proficiency scores in core subjects are more reliable 
indicators of how a majority of students are faring in a district 
and whether that district and its students need more support. 
‘The legislature designed the “smart cap” specifically to 
provide more charter public schools to families who live in dis- 
tricts where student performance is 
low. Altering the original formula 
to include SGP has distorted our 
understanding of which districts 


on the cusp of the bottom 10 percent. It should mitigate 
negative financial impacts as much as possible, as declining 
enrollment that is a result of the reduction in the NSS cap 
is out of a school’s control. DESE should also work close- 
ly with prospective charter operators to identify pockets of 
need and demand for charters that are less likely to be affect- 
ed by a volatile NSS cap. Both of the Commonwealth’s char- 
ter school caps drive potentially high-quality operators out 
of the state, but the department may be able to proactively 
redirect prospective operators in the future. 


Increase transparency around the meaning and use of SGP. 
There is consensus in the research literature that SGP mea- 
surements at all levels contain a degree (in some cases a large 
degree) of statistical error. Some argue that the amount of error 
is reason to abandon the measure. 
Others claim that all statistical 
measures are vulnerable to error 
and they would rather use SGP, 


however imprecise, as one of many 


levels contain a degree 
(in some cases a large 


indicators that can paint an overall 
picture of student performance for 
educators and policymakers. 


There is consensus in the 
research literature that 
SGP measurements at all 


degree) of statistical error. 
Some argue that the 
amount of error is reason 


charter school caps drive 
potentially high-quality 
operators out of the state, 


are most in need. Reverting to a 


As this paper points out, SGP 
formula that uses absolute pro- may be an informative measure, 


ficiency as the only measure of but it should not be used for high- 


operators in the future. 


but the department may achievement can alleviate that dis- stakes policy decisions. No matter to abandon the measure. 
be able to proactively tortion. Importantly, identification how it is used, the state should be 
redivect prospec _— of the lowest performing districts transparent with all stakeholders about its limitations. SGP 


does not have to exist as part of the 
state’s overall district accountabil- 
ity system. The state could con- 
tinue to hold districts accountable using both proficiency and 
growth, while basing eligibility for an increase in the NSS cap 
solely on proficiency. 


Mitigate the negative impacts of current regulations as 
much as possible. 

In 2014, then-Commissioner Mitchell Chester warned that 
overemphasizing SGP could distort our understanding of 
which districts are most in need of support. Since that time, 
other negative impacts of the new formula for determining 
the lowest performing 10 percent have become clear, and 
DESE has tracked those impacts and worked to mitigate 
them, where possible. The department should continue to 
proactively assess the financial impact decreased enrollments 
can have on charter schools that draw students from districts 
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alone cannot tell policymakers whether students are receiving 
an adequate education or equitable educational opportunities. 
SGP alone cannot accurately tell parents how much knowledge 
or skill their children have gained in the course of a year. SGP 
is an estimate—a comparative tool—and DESE should work 
hard to educate all stakeholders about what it can and cannot 
tell them about student, school, and district performance. 
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